DEMB: Cache-Aware Scheduling for Distributed Query Processing
نویسندگان
چکیده
Leveraging data in distributed caches for large scale query processing applications is becoming more important, given current trends toward building large scalable distributed systems by connecting multiple heterogeneous less powerful machines rather than purchasing expensive homogeneous and very powerful machines. As more servers are added to such clusters, more memory is available for caching data objects across the distributed machines. However the cached objects are dispersed and traditional query scheduling policies that take into account only load balancing do not effectively utilize the increased cache space. We propose a new multi-dimensional range query scheduling policy for distributed query processing frameworks, called DEMB, that employs a probability distribution estimation derived from recent queries. DEMB accounts for both load balancing and the availability of distributed cached objects to both improve the cache hit rate for queries and thereby decrease query turnaround time and throughput. We experimentally demonstrate that DEMB produces better query plans and lower query response times than other query scheduling policies.
منابع مشابه
Multiple query scheduling for distributed semantic caches
In distributed query processing systems, load balancing plays an important role in maximizing system throughput. When queries can leverage cached intermediate results, improving the cache hit ratio becomes as important as load balancing in query scheduling, especially when dealing with computationally expensive queries. The scheduling policies must be designed to take into consideration the dyn...
متن کاملCooperative caching for grid-enabled OLAP
In this paper, we propose a grid-based On-Line Analytical Processing (OLAP) application which distributes query computation across an enterprise grid. Our application follows a two-tiered process for answering queries based on sharing Cached OLAP data between the users at the local grid site and using grid scheduling approaches to execute the remaining parts of a query amongst a distributed set...
متن کاملEM-KDE: A locality-aware job scheduling policy with distributed semantic caches
In modern query processing systems, the caching facilities are distributed and scale with the number of servers. To maximize the overall system throughput, the distributed system should balance the query loads among servers and also leverage cached results. In particular, leveraging distributed cached data is becoming more important as many systems are being built by connecting many small heter...
متن کاملEfficient Distributed Top-k Query Processing with Caching
Recently, there has been an increased interest in incorporating in database management systems rank-aware query operators, such as top-k queries, that allow users to retrieve only the most interesting data objects. In this paper, we propose a cache-based approach for efficiently supporting top-k queries in distributed database management systems. In large distributed systems, the query performa...
متن کاملAn Optimizing Query Processor with an Efficient Caching Mechanism for Distributed Databases
This paper provides an efficient way of querying among many distributed and heterogeneous data sources. We describe a database optimization framework that supports data and computation reuse, query scheduling and caching mechanism to speed up the evaluation of multiquery workload. The Caching query result is stored as an eXtensible Markup Language (XML) document. An XML oriented common data mod...
متن کامل